DIMENSION REDUCTION AND SHRINKAGE

 

1. VARIABLE SELECTION

 

> attach(Auto)

> library(leaps)

> reg.fit = regsubsets( mpg ~ cylinders + displacement + horsepower + weight + acceleration + year, Auto )

> summary(reg.fit)

Selection Algorithm: exhaustive

         cylinders displacement horsepower weight acceleration year

1  ( 1 ) " "       " "          " "        "*"    " "          " "

2  ( 1 ) " "       " "          " "        "*"    " "          "*"

3  ( 1 ) " "       " "          " "        "*"    "*"          "*"

4  ( 1 ) " "       "*"          " "        "*"    "*"          "*"

5  ( 1 ) "*"       "*"          " "        "*"    "*"          "*"

6  ( 1 ) "*"       "*"          "*"        "*"    "*"          "*"

 

# This command finds the best model for each p = number of independent variables. The best model is determined by the lowest RSS.

 

# Next, choose the best p according to some criteria:

 

> summary(reg.fit)$adjr2                                                             # Adjusted R2

[1] 0.6918423 0.8071941 0.8071393 0.8067872 0.8067841 0.8062826

> summary(reg.fit)$cp                                                                   # Mallows Cp

[1] 232.396144   1.169751   2.284200   3.992019   5.000800   7.000000

> summary(reg.fit)$bic                                                                  # BIC = Bayesian information criterion

[1] -450.5016 -629.3564 -624.2828 -618.6081 -613.6448 -607.6743

 

# Recall that plain R2  is not a fair measure of performance. It always increases with p:

 

> summary(reg.fit)$rsq

[1] 0.6926304 0.8081803 0.8086190 0.8087638 0.8092549 0.8092553

 

 

# For stepwise or backward elimination variable selection, use method=”forward” or method=”backward”.

 

> library(MASS)

> reg = regsubsets( medv ~ ., data=Boston, method = "backward" )

 

# There is a nice way to visualize results, ranking models by the chosen “scale”. Black color means the variable is included into the model, white means it is excluded.

 

> plot(reg)

> plot(reg, scale = "adjr2" )

 

  

 

# To see more models, use option “nbest”, which is the number of models of each size p to be compared.

 

> reg = regsubsets( medv ~ ., data=Boston, method = "backward", nbest=4 )

> plot(reg, scale = "adjr2" )

 

 

# We can also choose the best model by means of a stepwise procedure, starting with one model and ending with another.

 

> null = lm( medv ~ 1, data=Boston )

> full = lm( medv ~ ., data=Boston )

> step( null, scope=list(lower=null, upper=full), direction="forward" )

Start:  AIC=2246.51

medv ~ 1

 

          Df Sum of Sq   RSS    AIC

+ lstat    1   23243.9 19472 1851.0             # Compare contributions of

+ rm       1   20654.4 22062 1914.2             # remaining independent variables

+ ptratio  1   11014.3 31702 2097.6

+ indus    1    9995.2 32721 2113.6

+ tax      1    9377.3 33339 2123.1

+ nox      1    7800.1 34916 2146.5

+ crim     1    6440.8 36276 2165.8

+ rad      1    6221.1 36495 2168.9

+ age      1    6069.8 36647 2171.0

+ zn       1    5549.7 37167 2178.1

+ black    1    4749.9 37966 2188.9

+ dis      1    2668.2 40048 2215.9

+ chas     1    1312.1 41404 2232.7

<none>                 42716 2246.5

 

Step:  AIC=1851.01

medv ~ lstat

 

          Df Sum of Sq   RSS    AIC

+ rm       1    4033.1 15439 1735.6

+ ptratio  1    2670.1 16802 1778.4

+ chas     1     786.3 18686 1832.2

+ dis      1     772.4 18700 1832.5

+ age      1     304.3 19168 1845.0

+ tax      1     274.4 19198 1845.8

+ black    1     198.3 19274 1847.8

+ zn       1     160.3 19312 1848.8

+ crim     1     146.9 19325 1849.2

+ indus    1      98.7 19374 1850.4

<none>                 19472 1851.0

+ rad      1      25.1 19447 1852.4

+ nox      1       4.8 19468 1852.9

 

... < truncated > ...

 

Step:  AIC=1585.76

medv ~ lstat + rm + ptratio + dis + nox + chas + black + zn +

    crim + rad + tax

 

        Df Sum of Sq   RSS    AIC

<none>               11081 1585.8

+ indus  1   2.51754 11079 1587.7

+ age    1   0.06271 11081 1587.8

 

Call:

lm(formula = medv ~ lstat + rm + ptratio + dis + nox + chas +

    black + zn + crim + rad + tax, data = Boston)

 

Coefficients:

(Intercept)        lstat           rm      ptratio          dis          nox 

  36.341145    -0.522553     3.801579    -0.946525    -1.492711   -17.376023 

       chas        black           zn         crim          rad          tax 

   2.718716     0.009291     0.045845    -0.108413     0.299608    -0.011778 

 

 

# The final model contains variables lstat, rm, ptratio, dis, nox, chas, black, zn, crim, rad, and tax.

 

 

 

2. RIDGE REGRESSION

 

# Dataset “Boston” has some strong correlations, resulting in multicollinearity.

 

> cor(Boston)

 

# Apply ridge regression

 

> lm.ridge( medv~., data=Boston, lambda=0.5 )

 

                       crim            zn         indus          chas

 36.270091954  -0.107524388   0.046092635   0.018815242   2.693518778

          nox            rm           age           dis           rad

-17.646013524   3.816019080   0.000578687  -1.469191794   0.301928336

          tax       ptratio         black         lstat

 -0.012134702  -0.950831885   0.009309388  -0.523845421

 

 

# We can see how the slopes change with penalty lambda. These are estimated slopes for different lambda. When lambda=0, we get LSE. Large lambda forces them toward 0.

 

> rr = lm.ridge(medv ~ ., data=train, lambda=seq(0,1000,1) )

> rr

> plot(rr)

 

 

# To choose a good lambda, fit ridge regression with various lambda and compare prediction performance.

 

> select(rr)

modified HKB estimator is 4.321483

modified L-W estimator is 3.682107

smallest value of GCV  at 5

 

# So, the best lambda is around 5. We’ll look closer at that range.

 

> rr = lm.ridge( medv~., data=Boston, lambda=seq(0,10,0.01) )

> select(rr)

modified HKB estimator is 4.594163

modified L-W estimator is 3.961575

smallest value of GCV  at 4.26

> plot(rr$lambda,rr$GCV)